Dataset Integration

Datasets were integrated using the method ‘Linked Inference of Genomic Experimental Relationships (LIGER)’ (Welch et al. 2019). This involved four pre-processing steps: (1) normalization for UMIs per cell, (2) subsetting the most variable genes for each dataset, (3) scaling by root-mean-square across cells, and (4) filtering of non-expressive genes.

Key pre-processing parameters:

Note: The length of the union across datasets (individuals) varies. The Venn and UpSet plots below may reveal outlying dataset/s.

**Upset chart of selected variable genes**: The first **<i> 25 </i>** vertical bar charts show the sizes of isolated dataset participation to the total variable genes used for integration.

Upset chart of selected variable genes: The first 25 vertical bar charts show the sizes of isolated dataset participation to the total variable genes used for integration.

LIGER Factorization

An integrative non-negative matrix factorization was performed in order to identify shared and distinct metagenes (factors) across the datasets. The corresponding factor/metagene loadings were calculated for each cell.

Key factorization parameters:

  • Number of factors (inner dimension of factorization; k): 20

  • Penalty parameter which limits the dataset-specific component of the factorization (lambda): 5

  • Resolution parameter which controls the number of communities detected: 1

Batch Effect Correction by LIGER

The performance of LIGER in batch effect correction was evaluated by comparison with dataset without a data integration algorithm applied (i.e. PCA input for dimensionality reduction). For each of the categorical covariates specified by the user two side-by-side comparisons have been represented: (1) visualisation of the batch effect using tSNE plots, and (2) quantification of the batch effect based on kBET test results (Büttner et al. 2019).

In each kBET plot, the rejection rate represents the fraction of neighbourhoods with a label composition different from the global composition of batch labels. A significantly different observed vs. expected rejection rate opposes the well-mixedness of the data.

Categorical covariates

group

<b>Figure: tSNE by group</b>

Figure: tSNE by group

<b>Figure: tSNE (Liger) by group</b>

Figure: tSNE (Liger) by group

<b>Figure: kBET by group</b>

Figure: kBET by group

<b>Figure: kBET (Liger) by group</b>

Figure: kBET (Liger) by group

sex

<b>Figure: tSNE by sex</b>

Figure: tSNE by sex

<b>Figure: tSNE (Liger) by sex</b>

Figure: tSNE (Liger) by sex

<b>Figure: kBET by sex</b>

Figure: kBET by sex

<b>Figure: kBET (Liger) by sex</b>

Figure: kBET (Liger) by sex

individual

<b>Figure: tSNE by individual</b>

Figure: tSNE by individual

<b>Figure: tSNE (Liger) by individual</b>

Figure: tSNE (Liger) by individual

<b>Figure: kBET by individual</b>

Figure: kBET by individual

<b>Figure: kBET (Liger) by individual</b>

Figure: kBET (Liger) by individual

References

Büttner, Maren, Zhichao Miao, F. Alexander Wolf, Sarah A. Teichmann, and Fabian J. Theis. 2019. “A test metric for assessing single-cell RNA-seq batch correction.” Nature Methods 16 (1): 43–49. https://doi.org/10.1038/s41592-018-0254-1.

Welch, Joshua D., Velina Kozareva, Ashley Ferreira, Charles Vanderburg, Carly Martin, and Evan Z. Macosko. 2019. “Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity.” Cell 177 (7). Cell Press: 1873–1887.e17. https://doi.org/10.1016/j.cell.2019.05.006.


scFlow v0.5.0 – 2020-05-15 22:27:38

 

A report by scFlow